{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3881c7b4-7bcc-48a7-aaa0-01f8ce63b16e",
   "metadata": {},
   "source": [
    "# LlamaIndex Platform Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de23a434-bd51-4be3-b4a6-3b5c84284406",
   "metadata": {},
   "source": [
    "## Step 0: Setup environment config for platform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b9ac844f-f623-4564-b9b9-53ecf5751f5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5db501ac-28d0-4b6b-9565-048867e61ec4",
   "metadata": {},
   "source": [
    "## Step 1: Configure ingestion pipeline (data source, transformations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "da098c00-be39-49fc-b1fa-515cc9daad2f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.ingestion import IngestionPipeline\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ed608c95-a2b2-49b8-b852-2ff87e41bc79",
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = SimpleDirectoryReader(input_files=['data_pg/source_files/source.txt'])\n",
    "docs = reader.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d9438a8a-3b43-4cf9-a4af-bbea09a71584",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg_pipeline = IngestionPipeline(\n",
    "    project_name='paul graham', \n",
    "    name='essay',\n",
    "    documents=docs,\n",
    "    transformations=[\n",
    "        SentenceSplitter(),\n",
    "        OpenAIEmbedding(),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88768650-9142-49e9-9858-418a14e1327c",
   "metadata": {},
   "source": [
    "## Running ingestion locally"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60a16fe5-ab62-41a1-bbd4-a9a99a572193",
   "metadata": {},
   "source": [
    "Let's try running this locally first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1b60f0b8-3724-4c3d-9722-4793fe3948fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = pg_pipeline.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a415a15e-f141-4d58-a980-65460fdbc0b4",
   "metadata": {},
   "source": [
    "This runs quickly, since data volume is small.  \n",
    "We can inspect the processed nodes, but limited by the notebook interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "96fca25d-357a-45b7-bb6a-7f90dde2dbef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ingested 21 nodes\n"
     ]
    }
   ],
   "source": [
    "print(f'Ingested {len(nodes)} nodes')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7c3d57c9-e15f-45a6-bcb3-3acd1e8fd4d1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Node ID: badb3184-1bac-4674-a28d-b9d60ea1d57c\n",
      "Text: What I Worked On  February 2021  Before college the two main\n",
      "things I worked on, outside of school, were writing and programming. I\n",
      "didn't write essays. I wrote what beginning writers were supposed to\n",
      "write then, and probably still are: short stories. My stories were\n",
      "awful. They had hardly any plot, just characters with strong feelings,\n",
      "which I ...\n"
     ]
    }
   ],
   "source": [
    "print(nodes[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8cff5dd5-3fa9-4933-9872-fa39c0dace47",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'file_path': 'data_pg/source_files/source.txt',\n",
       " 'file_name': 'source.txt',\n",
       " 'file_type': 'text/plain',\n",
       " 'file_size': 75084,\n",
       " 'creation_date': '2023-12-15',\n",
       " 'last_modified_date': '2023-12-15',\n",
       " 'last_accessed_date': '2024-02-09',\n",
       " 'ref_doc_id': '620ee62d-3a14-41f6-a9ba-086a7ea5ddcb',\n",
       " 'document_title': ''}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nodes[0].metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "672221a0-bc12-46bf-8ce3-d427289f7b3c",
   "metadata": {},
   "source": [
    "### Key challenges for AI engineer\n",
    "- hard to visualize intermediate data\n",
    "- hard to tune parameters and understand impact on downstream performance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b71130b0-5671-4001-a6a9-b56545d77054",
   "metadata": {},
   "source": [
    "## Platform Playground"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbc13ce4-1906-4fd7-a95e-55c25f0f905e",
   "metadata": {},
   "source": [
    "### Register pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eac4742e-7272-4c8a-b6ef-80daac3e7de4",
   "metadata": {},
   "source": [
    "We can register the pipeline to Platform Playground, where we can easily experiment with different parameter configurations and evaluate downstream performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a0414ece-cd3b-499a-8055-1e3ef0e9dbf2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pipeline available at: https://cloud.llamaindex.ai/project/5e1a294d-a1f9-4c47-96d9-a2c71bd7acbf/playground/5bb3ff13-63df-43d2-a25e-755890092c00\n"
     ]
    }
   ],
   "source": [
    "pg_pipeline_id = pg_pipeline.register()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9dab0a7-48cb-4739-aa36-91567b0cc33c",
   "metadata": {},
   "source": [
    "### Pull eval dataset from Llama Hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7264a346-817e-44c1-b7af-16c13dbc098d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llama_dataset import download_llama_dataset\n",
    "from llama_index.core.evaluation.eval_utils import upload_eval_dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8facbeb4-8136-4f64-a57f-56f8525c80ae",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rag_dataset, _ = download_llama_dataset('PaulGrahamEssayDataset', download_dir='data_pg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "bb39d9df-2d37-48ce-9355-46f24a96949e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or, load from downloaded dir\n",
    "# from llama_index.llama_dataset import LabelledRagDataset\n",
    "# rag_dataset = LabelledRagDataset.from_json(\"./data_pg/rag_dataset.json\")\n",
    "questions = [example.query for example in rag_dataset.examples[:5]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "a9b78a8b-b811-4918-b2c4-92b8baa12707",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. In the essay, the author mentions his early experiences with programming. Describe the first computer he used for programming, the language he used, and the challenges he faced.\n",
      "2. The author switched his major from philosophy to AI during his college years. What were the two specific influences that led him to develop an interest in AI? Provide a brief description of each.\n",
      "3. In the essay, the author discusses his initial interest in AI and his eventual disillusionment with it. According to the author, what were the two main influences that initially drew him to AI and what realization led him to believe that the approach to AI during his time was a hoax?\n",
      "4. The author mentions his shift of interest towards Lisp, a programming language. What reasons does he provide for this shift and how did he further his understanding of Lisp?\n",
      "5. In the essay, the author mentions his interest in both computer science and art. Discuss how he attempts to reconcile these two interests during his time in grad school. Provide specific examples from the text.\n"
     ]
    }
   ],
   "source": [
    "for ind, question in enumerate(questions):\n",
    "    print(f\"{ind + 1}. {question}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c45079d-2047-4a72-8d37-42bc3b648eb8",
   "metadata": {},
   "source": [
    "### Upload eval dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c58ef7f-fe2a-4d99-adad-3beb2cb36fbf",
   "metadata": {},
   "source": [
    "Let's upload a list of evaluation questions we have curated to help us iterate on the ingestion pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "faa16749-79fc-48bc-a4e4-540e8de9d333",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploaded 5 questions to dataset AI generated - 5 questions\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'dbfc40dc-42d3-4338-84ba-08301591fba5'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "upload_eval_dataset(\n",
    "    project_name='paul graham',\n",
    "    dataset_name='AI generated - 5 questions',\n",
    "    questions=questions,\n",
    "    overwrite=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d2bc3df-3394-4bc9-801a-ede3959543a4",
   "metadata": {},
   "source": [
    "### Download pipeline"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d32aca15-7607-4ade-8eab-67cec4c0294f",
   "metadata": {},
   "source": [
    "After we tune our pipeline with the desired evals, we can download the config back to local development."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "f8b0635d-8694-4467-bcc9-ec75f4bd3594",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_pg_pipeline = IngestionPipeline.from_pipeline_name(\n",
    "    project_name='paul graham', \n",
    "    name='essay',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "c5f283dc-3aff-4144-9273-11896b2e0d3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_nodes = new_pg_pipeline.run(documents=docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "e9ff66a4-2908-40d2-9e0d-e9890ff7f34a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "21"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(new_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "b563ee37",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name: llama-index-core\n",
      "Version: 0.10.14.post1\n",
      "Summary: Interface between LLMs and your data\n",
      "Home-page: https://llamaindex.ai\n",
      "Author: Jerry Liu\n",
      "Author-email: jerry@llamaindex.ai\n",
      "License: MIT\n",
      "Location: /opt/homebrew/lib/python3.11/site-packages\n",
      "Requires: aiohttp, dataclasses-json, deprecated, dirtyjson, fsspec, httpx, llamaindex-py-client, nest-asyncio, networkx, nltk, numpy, openai, pandas, pillow, PyYAML, requests, SQLAlchemy, tenacity, tiktoken, tqdm, typing-extensions, typing-inspect\n",
      "Required-by: llama-index, llama-index-agent-openai, llama-index-embeddings-openai, llama-index-llms-openai, llama-index-multi-modal-llms-openai, llama-index-postprocessor-colbert-rerank, llama-index-postprocessor-flag-embedding-reranker, llama-index-program-openai, llama-index-question-gen-openai, llama-index-readers-file, llama-index-retrievers-bm25, llama-parse\n"
     ]
    }
   ],
   "source": [
    "!pip3 show llama-index-core"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d073fa97-e6bf-4527-9b54-8fc2fd7dd8a4",
   "metadata": {},
   "source": [
    "We can now build our query engine as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "bf774ab9-0ec8-4d50-9337-adde4bceb586",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.indices import VectorStoreIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "cb8af5df-07ea-466a-9e98-17e643baf153",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex(nodes=new_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "a6da8e5d-68df-4fce-8132-c295730ecfa9",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "ef68e619-4335-4358-8fed-d4602eaaa99a",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"\"\"\\\n",
    "In the essay, the author mentions his early experiences with programming. \n",
    "Describe the first computer he used for programming, the language he used, and the challenges he faced.\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "55a2ed3d-f7f1-4786-bb14-fa39326e8d6e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author mentions using the IBM 1401 computer for programming during his early experiences. The language he used on this computer was an early version of Fortran. One of the challenges he faced was the limited input options for programs, as the only form of input was data stored on punched cards, which he did not have access to. This limitation led to difficulties in finding meaningful tasks to perform with the computer.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
